28 research outputs found
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
Learning nonparametric latent causal graphs with unknown interventions
We establish conditions under which latent causal graphs are
nonparametrically identifiable and can be reconstructed from unknown
interventions in the latent space. Our primary focus is the identification of
the latent structure in measurement models without parametric assumptions such
as linearity or Gaussianity. Moreover, we do not assume the number of hidden
variables is known, and we show that at most one unknown intervention per
hidden variable is needed. This extends a recent line of work on learning
causal representations from observations and interventions. The proofs are
constructive and introduce two new graphical concepts -- imaginary subsets and
isolated edges -- that may be useful in their own right. As a matter of
independent interest, the proofs also involve a novel characterization of the
limits of edge orientations within the equivalence class of DAGs induced by
unknown interventions. These are the first results to characterize the
conditions under which causal representations are identifiable without making
any parametric assumptions in a general setting with unknown interventions and
without faithfulness.Comment: To appear at NeurIPS 202
A super-polynomial lower bound for learning nonparametric mixtures
We study the problem of learning nonparametric distributions in a finite
mixture, and establish a super-polynomial lower bound on the sample complexity
of learning the component distributions in such models. Namely, we are given
i.i.d. samples from where and we are interested in learning each component .
Without any assumptions on , this problem is ill-posed. In order to
identify the components , we assume that each can be written as a
convolution of a Gaussian and a compactly supported density with
. Our main result shows
that
samples are required for estimating each . The proof relies on a fast rate
for approximation with Gaussians, which may be of independent interest. This
result has important implications for the hardness of learning more general
nonparametric latent variable models that arise in machine learning
applications
DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization
The combinatorial problem of learning directed acyclic graphs (DAGs) from
data was recently framed as a purely continuous optimization problem by
leveraging a differentiable acyclicity characterization of DAGs based on the
trace of a matrix exponential function. Existing acyclicity characterizations
are based on the idea that powers of an adjacency matrix contain information
about walks and cycles. In this work, we propose a acyclicity characterization based on the log-determinant (log-det)
function, which leverages the nilpotency property of DAGs. To deal with the
inherent asymmetries of a DAG, we relate the domain of our log-det
characterization to the set of , which is a key difference
to the classical log-det function defined over the cone of positive definite
matrices. Similar to acyclicity functions previously proposed, our
characterization is also exact and differentiable. However, when compared to
existing characterizations, our log-det function: (1) Is better at detecting
large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in
practice about an order of magnitude faster. From the optimization side, we
drop the typically used augmented Lagrangian scheme, and propose DAGMA
(), a method
that resembles the central path for barrier methods. Each point in the central
path of DAGMA is a solution to an unconstrained problem regularized by our
log-det function, then we show that at the limit of the central path the
solution is guaranteed to be a DAG. Finally, we provide extensive experiments
for and SEMs, and show that our approach
can reach large speed-ups and smaller structural Hamming distances against
state-of-the-art methods.Comment: To appear at NeurIPS 202